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Abstract. Many random processes can be simulated as the output of a deterministic model ac- 
cepting random inputs. Such a model usually describes a complex mathematical or physical stochastic 
system and the randomness is introduced in the input variables of the model. When the statistics 
of the output event are known, these input variables have to be chosen in a specific way for the 
output to have the prescribed statistics. Because the probability distribution of the input random 
variables is not directly known but dictated implicitly by the statistics of the output random vari- 
ables, this problem is usually intractable for classical sampling methods. Based on Markov Chain 
Monte Carlo we propose a novel method to sample random inputs to such models by introducing a 
modification to the standard Metropolis-Hastings algorithm. As an example we consider a system 
described by a stochastic differential equation (sde) and demonstrate how sample paths of a random 
process satisfying this sde can be generated with our technique. 
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1. Introduction. Most algorithms employed to sample from complicated prob- 
ability distributions such as rejection sampling and importance sampling assume full 
knowledge of the target density [17]. Contrary to these approaches Markov Chain 
Monte Carlo (MCMC) methods can be used to sample from distributions for which 
the form of the density function is known, but the function value itself can only be 
evaluated up to a scalar constant. 

MCMC algorithms devise a Markov chain on the sample space of a general vec- 
tor random variable. In typical settings the probability density function (pdf) of 
the distribution can only be evaluated up to a normalizing constant. The common 
Metropolis algorithm [11] starts with an initial state and generates samples of the 
random variable iteratively. At every step of the procedure a new state is proposed 
according to some proposal distribution. This proposal state is then accepted with a 
probability determined by the ratio of the pdf values for the new state and the old 
state. Because the accept-reject rule only requires the evaluation of the ratio of the 
probability densities for the proposed and the old state, it is sufficient to know the 
target pdf up to a scalar constant. The sole restriction of the Metropolis algorithm is 
that the proposal density is symmetric and simple enough to sample directly. 

One generalization of the Metropolis algorithm is the Metropolis-Hastings algo- 
rithm 7\ which can employ non-symmetric proposal densities. To achieve this, the 
acceptance probability is modified to incorporate a ratio of the proposal density val- 
ues. 

MCMC methods are very general tools in regard to dealing with intractable prob- 
abilistic settings. This generality allows MCMC to be integrated into many practical 
problems in diverse fields like computational biology [9], statistical physics [8], ran- 
dom number generation |16[l6]. artificial intelligence [1] and many more. A review for 
the applications of MCMC can be found in [13] . 
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Fig. 1.1. Graphical representation of a deterministic mapping with random input and output. 
The output of the complex system h, Y = /i(X), is a random vector of dimension two while the 
input X is a random vector with three dimensions. 

One of these broad applications deals with the model selection problem where 
one tries to choose a model among many competing models that is more likely to 
have generated the given probabilistic output data 5 . In this paper we investigate a 
related problem in which the model that generates the given data is fixed, but accepts 
a random input with an unknown pdf. This situation arises naturally in the context 
of complex models which accept random inputs either as the data to be processed or 
as random model parameters. The realization of these models in practice can be in 
various ways such as lengthy and complicated computer routines, a set of involved 
mathematical equations or any kind of black box evaluation. Nevertheless they can be 
viewed as a mapping of random variables as illustrated in figure 11.11 If h : l n — > K m 
is a multi- valued function of multiple variables with Y = h(X) and its inverse h^ 1 
does not exist or cannot be computed analytically then the question arises: How does 
one choose X for Y to have the desired probability density /vd? 

The answer to this question is not straightforward. First of all, generally h is very 
complicated and all we have is some kind of routine that evaluates it for a given input. 
Therefore the unknown density of X can be computed through the inverse mapping 
only in special cases where h is known explicitly and m = n with the Jacobian of h 
is globally invertible. The use of standard sampling algorithms including MCMC to 
sample X is for this reason not possible. Additionally more than one input distribution 
can generate the desired output distribution creating an issue of non-uniqueness. 

To address this problem we first review some Markov chain and MCMC theory 
then we provide a detailed description of the problem at hand and a toy example to 
demonstrate the concept before proceeding to develop a solution. Finally we conclude 
our discussion with a numerical example of a stochastic differential equation and 
demonstrate how our method can be used to sample from the space of solution paths 
to this equation. 

2. Background. In this section we present some elementary Markov chain and 
MCMC theory which will be required for later discussion. 

Definition 2.1. A sequence of indexed random variables Xj, i G N is called a 
Markov chain if the following property holds for every measurable set A C M™ 



T(x, A) = Pr(X.; + i G -4.|Xj = x) is called its transition kernel with the transition 
density r(x, x'), where 



Pr(X i+1 e^|X i ,X i _ 1 ,...,X ) = Pr(X i+1 e^Xi) . 



(2.1) 
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Definition 2.2. /(x) is called a stationary distribution of the Markov chain if 
it satisfies 

/(x') = J /(x)r(x,x')dx. (2.3) 

Lemma 2.3. A sufficient condition for /(x) to be a stationary distribution of the 
Markov chain X, is the detailed balance equation: 

/(x)r(x, x') = /(x')r(x', x), x, x' e M™ . (2.4) 

Proof. Integrating both sides of equation \2.1$ and using Definition \2.2l 

J /(x)r(x,x')dx = J /(x')r(x',x)dx 

R™ R™ 

= /(x') J r(x',x)dx 

R» 

= /(x) (2.5) 

□ 

Note that Lemma 12.31 gives a sufficient condition, and the necessary condition is 
much looser [T2] . Furthermore under certain conditions the stationary distribution is 
unique [2J. 

Above definitions and Lemma [2~3l are sufficient to describe the Metropolis-Hastings 
(and the Metropolis algorithm as its special case) in a formal way. Given a target dis- 
tribution /(x) for the random vector X the strategy of the algorithm is to construct 
a Markov chain on the state space of interest, K n , and choose a transition kernel such 
that the Markov chain has /(x) as its stationary distribution. This is accomplished 
in two stages. At the first stage the procedure takes a random step in the state space 
according to some proposal density p(x, x') which describes the probability of moving 
from the state, Xi = x, to the next one, X^+i = x'. Most common choice for p uses 
a form of increment on x such that x' = x + Ax. Commonly used densities for the 
random increment Ax are tractable ones like the uniform and the Gaussian density. 
At the second stage of the algorithm a decision is made whether the chain will advance 
to x' as its next state or stay at x. The decision mechanism uses the ratio yj^y in 
the decision rule 



/ /xQ px',x 

a x,x )=mm 1,——— - 

V /(x)p(x,x') 



(2.6) 



which gives the acceptance probability of the proposed move. After evaluating the 
accept-reject ratio, a random number u is sampled according to a standard uniform 
distribution and the move is accepted if u < a(x, x'). If the proposed state is not 
accepted the Markov chain remains in its previous state. Theorem 12 .41 shows that the 
distribution of the samples taken in this way indeed converges to /(x). 

Theorem 2.4. The transition kernel of the Metropolis- Hastings algorithm sat- 
isfies the detailed balance condition and /(x) is the stationary distribution of the 
resulting Markov chain. 
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Proof. The transition kernel T(x, A) can be written as the sum of two probabilities: 
The probability of an accepted step to a point x' in A and the probability of a rejection 
while the point x lies in A. 

T(x,A)= y"p(x,x')a(x,x')dx' + l {xe ^ } y"p(x,x')(l-a(x,x'))dx' 
A n 

Hence the transition density is given by 

r(x, x') = p(x, x')a(x, x') + <5 x (x')r(x) , 

where <5 x (x') is the point mass at x and r(x) = 1 — a(x, x')p(x, x')dx' is the 
probability that the chain does not leave its current position x. 

Lemma \2.3\ gives us a way of checking whether this transition kernel has the 
desired pdf as its stationary distribution. If we now check if equation \2.1$ is satisfied 
we find for the first summand of the transition density 

J(x)p(x,x>(x,x') = /(x)p(x,x')min (l ]iX ' ' '"^ X> 



/(x) p( x > x 0, 
min (/(x)p(x, x'), /(x')p(x', x)) 
Z(x)p(x,x') 

/(> 

/(x')p(x',x)a(x',x) . (2.7) 



= mm 



Finally for the second summand the requirement is trivially satisfied 

<5 x (x')r(x) = M X )K X ') , 

and this completes the proof. □ 

Further discussion of Markov chain and MCMC theory is outside the scope of 
this paper but excellent material on this subject can be found in [18] and |19j . 

3. An Illustrative Example. Now let us recap the problem described in the 
introduction. Suppose we are given a general many-to-one, non-isometric map h : 
M. n — > W n which maps a random vector X to another random vector Y = /i(X) 
where X <E R n , Y £ R ,n and let ./Vd be the desired probability density of Y. Given 
f-yd how must /x be chosen such that the transformed variable Y = /i(X) has the 
desired pdf? 

Given this setting one might be tempted to construct a Markov chain in the space 
of the input variables to sample X while evaluating the accept-reject rule probabilities 
of the Metropolis-Hastings algorithm in the space of the random output vector Y. 

As the following toy-example illustrates, this method does not result in a Markov 
chain in the space of input variables with the desired stationary distribution /vd of 
the output variables. 

Figure l3~T1 describes a discrete state space consisting of three states A = {Xi, X2, X3}. 
The arrows represent a many- to-one function h : A — > B with h(Xi) = Y\ and 
h(X2) — hiX'i) = Y% and Y\,Y% £ B. It is of no importance if B is discrete or 
continuous but the range of h is discrete for obvious reasons. 
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The desired distribution of Y is /yd(Yi) 
11.') and f Y d{Y 2 ) = 0.1. The results about 
Markov chains and the Metropolis-Hastings al- 
gorithm given in section[2]can easily be adopted 
to general finite state spaces and to this specific 
example. 

Now suppose that we are running the 
Metropolis algorithm on A with a symmetric 
proposal distribution P. For the current state 
Xi a new state is proposed according to the rule 



Pv(X j \X i )=P(X i ,X j 



Together with the accept-reject rule of 
the Metropolis algorithm, this results in a 
Markov chain with the transition probabilities 
T(Xi, Xj) = PiX^X^mm^ p}^ ) for 
i 7^ j and the transition probability matrix 
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Fig. 3.1. A toy-example to illustrate 
the problem of mapping the state variables 
X to another random variable Y with the 
desired probability distribution fyd- 




The left eigenvector of this matrix that corresponds to the eigenvalue 1 gives us the 
stationary distribution of the chain, which is (jr, tj, yj). It can be easily seen that this 
distribution does not provide the desired stationary distribution on the range of h. In 
fact this distribution corresponds to a function h' which maps X3 to a different value 
h'(Xz) which has the same probability as h'{X2). This behavior can also be observed 
with the more general Metropolis-Hastings algorithm by choosing a non-symmetric 
proposal distribution and applying the corresponding accept-reject rule. 

This toy-example illustrates clearly that to address the problem of creating the 
target probability density /Vd we have to take the properties of the mapping h into 
account and modify the Metropolis-Hastings algorithm accordingly. 

4. Modification of MCMC with a Probing Term. The reason that the 
above example fails to converge to the desired stationary distribution fyd lies within 
the properties of the general mapping h. First h is not one-to-one and hence the 
probability of a state Yi appearing in the chain on B depends on the probability of 
all the states Xj on the space of inputs for which h(Xj) = Yi holds. Additionally 
for the continuous case, even if h was one-to-one it would not necessarily be an 
isometry so that volumes are distorted under the mapping creating a similar effect 
on the stationary distribution. In this section we develop a method to overcome the 
shortcomings of MCMC sampling for the problem described in the previous section. 

In this context for the general case we first implement a probing procedure for the 
mapping h by using the output distribution that results when the input parameters 
are sampled uniformly and independently. Then we show that a modification of the 
target density with this uniform output density can be used in the space of parameters 
for the accept-reject rule in MCMC to achieve the desired density /vd on the range 
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a£h. 

Theorem 4.1. Let U be a uniform random vector on the probability space 
(f2, J 7 , Fxj) where is a bounded subset of K™ sttc/i i/ioi /u(ui) = fxj(u2) for all 
Ui,U2 € ^ wii/i /u as the probability density function of the cumulative distribution 
function F\j. And let h : M" — > R m be a mapping satisfying the required regularity 
conditions such that (f2', J-') with Q' = h(fl) is the induced sample space by h and 
the associated a-algebra. Then a random variable X € M n constructs another random 
variable Y = /i(X), Y € M m £/ie desired probability density /ycZ i/ X has the 
unnormalized probability density 

, , v / Yd (/t(x)) 

/x X OC . . rr 

/qO( x )) 

where /q «s i/ie probability density of the transformed random variable Q = /i(U). 
Proof. 

Consider the bounded sample space fl C M n , 17 = [ai, /3±] x [a2, ^2] x • • • x [a n , j3 n ] 
in which we assume /q is strictly positive. For the cumulative distribution function 
of Q we get 



F Q (q) = Pr(Q<q)=Pr({u:uef>, h(u) < q}) . (4.1) 

which can be written with the indicator function as 



F Q(q) = / l{u:k(u)<,)/u(u)du (4.2) 



Note that the indicator function in equation can be expressed with the components 

of the random vector q and the function h as 



l{u:h(u)< q } = s(qi - h 1 (u))s(q 2 - h 2 (u)) . . . s(q m - h m (u)) (4.3) 

where s is the unit step function. 

The pdf of Q is given by /cj(q) = "g^g^~§^~ ■ Using generalized functions and 
equation OTM) we can write this expression as 



/cj(q) = J S (Qi - hi{u))5{q 2 - h 2 (u)) . ..5{q m - /i m (u))/u(u)du 

n 

01 Hi 0n 

oc / ... 6(qi - h 1 (u))S(q 2 - h 2 (u)) . . . 5{q m - /i m (u))du (4.4) 



ai Q2 CK« 



If we now set the distribution of the input random variable X proportional to the 
ratio of the desired distribution o/Y and the distribution o/Q, 



(4.5) 



we have for the cumulative distribution function of Y = h(X.) 



F Y (y) = Pr(Y < y) 

= Pr({x : x e fi, /i(x) < y}) 

= y l{x:/ l (x)<y}/x(x)dx 

= J W) ^ } ^(Mx)y dx • (4 - 6) 



Finally we have for the output probability density, 



•My) = ~— « 5 — ^V(y) 

oyidy2 ■ ..oy m 



01 P2 Pn 

S( Vl - /m(x))5(j/ 2 - ft 2 (x)) . . . S(y m - fe m (x)) / 7 rf(fe(x)) dx 

/q(M x )) 



/yrf(y) 
a / Q (y) 



y 5(y - Mx))dx 



o 

/q(y) 

cx / Yrf (y) (4.7) 

Since both are normalized probability densities /y = /y<j ZioZrfs and the distribution of 
the image of samples on 1" will be equal to the desired distribution on M m . □ 

Theorem 14.11 shows that we can find the distribution of a random variable X that 
gives us the desired density /Yd through the mapping h provided that we know the uni- 
form input distribution /q. This can be accomplished by modifying the Metropolis- 
Hastings accept-reject rule in (|2.6j) as 

W-ntoL t^^m® ) - (4.8, 
V /Yd(Mx)) p(x,x') f Q (h{x!))J 



Note that Theorem 14.11 assumes a bounded support for the uniformly sampled 
random vector Q, with Q, = [oti, /3i] x [02, ^2] x ••• x [a n , /?»]■ This assumption 
implies that the support of the input vector X is equal to or a subset of f2. Therefore 
in case X has unbounded support, this technique will sample a truncated version of 
the input random vector. Nevertheless practical difficulties caused by this fact can 
be overcome with an adjustment of which theoretically can be chosen arbitrarily 
large. 

Furthermore Theorem 14.11 gives us only the unnormalizcd pdf which is sufficient 
to sample X with MCMC. But the above method can be used irrespective of the 
specific sampling method once this density is normalized. Hence we obtain a general 
method to control the input of complex systems with prescribed random outputs. 

In practical applications one will not always be able to compute /q analytically. 
In these situations /q will have to be substituted with an approximation /q . For this 
purpose one can use various density estimation schemes available. For large data sets 

7 



nonparametric schemes like kernel density estimators and nearest neighbour methods 
[5] can be used. For other settings Bayesian schemes like the EM algorithm [T3] can 
be employed for inference. 

5. An Application: Stochastic Differential Equations. In this section we 
demonstrate an example for our algorithm on stochastic differential equations. In 
this case the model is given by a differential equation driven by random noise and the 
input random variable takes the form of the solution to this equation. 

Consider the one dimensional Ito stochastic differential equation 



dX t = b(Xt,t)dt + a(X t ,t)dWt , 0<t<T (5.1) 
X = c 

where a, b : R x [0, T] — > M. are measurable functions and Wt is the Wiener process. 

A numerical treatment of this equation can be done by discretization using the 
simple Euler scheme. 



X l+1 = X i + b(X i ,t i )At + a(X i ,t i )AW i , = t < h < ■ ■ ■ < t N = T (5.2) 

We now set AXi = — X^i and define the random vectors 

AX = (AXi, AX 2 , . . . , AX N ) T and Y = ft(AX) = (Yi, Y 2 , . . . , Y N ) T with 



Yi = Ax l -w- l , tl ,)M = AWi _ i 

Note that AX together with X$ completely determines the sample path. Hence 
if we can sample AX such that it satisfies equation (|5.2I) , that means we can generate 
a solution path to the stochastic differential equation. The distribution of AX is 
unknown but we know that Y is a Gaussian random vector with i.i.d. zero mean 
components with variance At. Using this fact we can employ the modified MCMC 
algorithm to sample solution paths. 

For this general class of stochastic differential equations we can obtain the pdf 
of the output vector when the input random variables are sampled uniformly. First 
we derive the expression of the joint output distribution for the uniformly sampled 
input variables. Let AU € R w be a random vector with i.i.d. components distributed 
uniformly in [—p, p] and Q = ft(AU) G M. N another random vector with the joint pdf 
/q. We can express /q as the product of conditional pdfs as follows. 



/q(q) = /q((?ij?2,...,gjv) = f(qi)f{q2\qi)---f{qN\qN~i,qN-2,---,qi) (5.4) 

It can easily be seen from equation (|5.3[) that each of these conditional pdfs are 
uniform in a range determined by the previous values of q t . Particularly since 



w-w^w Ut = Ut _ l+AUk 

a{Ui-i,ti-i) 



we have 



f(qi\qi-i,qi-2, ...,qi) = f{qi\Ui-x, Ui- 2 , ■ . .,ui,u ) 

-p - 



oc |ai_i| 



s [q. 



\<H-i\ 



s[q 



p - bj-i 

\di-l\ 



(5.5) 



where ak = a(uk,tk), bk — b{uk,tk) and s is the step function. Now we can write the 
joint density function. 




, , i. rn iffte ^f.^rr Vie {1,2,. ..,7V} 
JQ[qi,q2,---,qN) ex < i=0 L i a -n i^-ii j (5.6) 



As discussed in the previous section, the restriction of AUi in [— p, p] is a practical 
necessity and does not create any problems in real world applications since p can be 
chosen arbitrarily large, and the points where /q is zero can be viewed as proposals 
of impossible states and rejected immediately. 

Combining equations (|4.8I) and (|5.6p the whole accept-reject probability of the 
MCMC algorithm can be written as 

q(Ax, Ax ) = mm 1, ^ (5.7) 

V / Y d(/i(Ax)) / Q (/i(Ax'))/ 

with/ Yd (y)cx e -y T y/(2A t ). 

As a numerical example for the above procedure consider the linear stochastic 
differential equation 



dX t = pX t dt + crX t dWt , < t < 1 (5.8) 
X = 1 

where p and a are scalar constants. This equation describes the geometric Brown- 
ian motion |14j which finds applications in mathematical finance, particularly in the 
Black-Scholes model of financial markets 4 . This is a good example for demonstra- 
tion purposes because one can obtain its solution analytically. A stochastic process 
satisfying equation (|5.8p will have the form 

X t = X e^ t+aW ^ (5.9) 

2 

where jl = ji — It's pdf has a lognormal distribution, 

f Xt (x,t) = ■ 1 e -^-^x Q )-at)'/(2^t) (5 1Q) 

<jx\J lid, 

and the autocorrelation of X t is given by 



R{s,t) = e M(^+t)( e ^ 2 min( s ,t) _ j j 
9 



(5.11) 




Fig. 5.1. Analytical probability density functions of Xt at t = 0.1, t= 0.5 and t=l compared 
with the empirical pdfs of the simulation data. 




Fig. 5.2. Normalized autocorrelation of Xt at three different time points compared with simu- 
lation data. 



Figures 15.11 and 15.21 display the results of a simulation with the modified MCMC 
algorithm. The scalar constants in equation (|5.8|) were chosen as fi = 1 and a = 0.5. 
The time axis was divided in one hundred equal length intervals with At — 0.01. 
Initially 5.1 x 10 6 samples were generated and the first 10 5 samples were discarded as 
the burn-in length. For this setting p was chosen to be 2 and a uniform distribution 
in [—0.2, 0.2] was used as the proposal distribution for AX. Figure 15.11 shows three 
analytical pdfs at different time points compared with the empirical pdfs obtained 
from the simulation data and figure 15.21 shows the normalized autocorrelation with 
one time point held fixed and the second one varied between and 1. These graphical 
results verify that the sample paths built using our algorithm converge to the desired 
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stationary distribution and hence satisfy the given stochastic differential equation. 

One noteworthy property of numerical solutions using the modified MCMC algo- 
rithm is that all points of a sample path get sampled in parallel as opposed to classical 
iterative methods such as the Euler-Maruyama scheme [TO]. These methods usually 
begin with the initial value Xq, and sample later points of the solution path with an 
iterative update rule given by the difference equation (|5.2j) . For this reason dealing 
with more complicated settings like stochastic boundary value problems of the form 

dX t = b{X u t)At + a(X t ,t)dW t , 0<t<T (5.12) 
h(X ,X T ) = 

becomes troublesome because the points of a sample path are not independent of its 
future values. On the other hand the incorporation of boundary conditions to the 
modified MCMC algorithm is straightforward since the points of the proposed sample 
paths are obtained simultaneously with independent increments. 

6. Conclusion. We have presented a solution to the problem of input variable 
sampling for complex stochastic models with prescribed output distribution. This 
approach is based on a modification to the Metropolis-Hastings algorithm with an 
additional expression which can be viewed as a probing term for the model of interest. 
Our algorithm is easy to implement, benefits from the extensive literature on MCMC 
and hence we believe that it can be adapted to a variety of applications. We have 
demonstrated one such application on general stochastic differential equations viewing 
them from the perspective of stochastic input-output models enabling us to apply our 
algorithm to obtain solution paths. 

Although this paper is based on MCMC, the approach taken to tackle the input 
variable sampling problem docs not require any specific sampling method to be used. 
The algorithm presented here can be implemented equally well with other sampling 
methods once the output distribution for uniformly sampled input variables is worked 
out and therefore offers a fresh approach for dealing with general stochastic models. 
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